Information access

ABSTRACT

According to the present invention, apparatus and methods are provided to enable a user to locate and retrieve sets of information relevant to search criteria specified in a search query submitted by the user. Search results include not only a list of information sets matching with the search criteria, but also the preserved structure of any tags used in annotating the information set according to a structured mark-up language such as XML. A user may select a tag from a presented list of the returned tag structures, and the apparatus lists those documents containing the selected tags. The list of tags is then adjusted to include the selected tag and any other of the returned tags contained in the listed documents. Further tag selection from the adjusted list leads to a further refinement of the listed documents, enabling the user to navigate the search results on the basis of tag information.

[0001] This invention relates to information access and finds particularapplication in locating information contained in documents that havebeen annotated using a structured markup language.

[0002] To assist in locating information stored, for example, in acomputer-based distributed file store, search engines of various typeshave been implemented in software to assist with identifying data setsthat contain information of at least some relevance to a user's searchcriteria. To assist with information location, search engines are oftenable to make use of already constructed indexes to particular fields ordomains of information, or to exploit summary or keyword data storedwithin data sets themselves.

[0003] However, it is often necessary for a search engine to analyse thecontents of a data set to try to determine it's primary informationcontent and to assess the relevance of that information to the user'srequirements. This is a more or less difficult task, according to theway the information is presented and structured.

[0004] In the context of a distributed information store such as thatprovided by the Wordwide Web (known as the “web”), a markup language hasbeen developed and standardised to improve identification and access toinformation contained in web pages. The Hypertext Markup Language (HTML)used to annotate web pages includes a <META> tag for use in identifyinga list of keywords provided by the web page author and indicative of theinformation content of the web page. Search engines may search for a<META> tag within a web page and compare any associated keywords with auser's search criteria to determine whether or not the information inthe page is likely to be relevant.

[0005] More recently, a mark-up language called extensible MarkupLanguage (XML) has been developed to provide a more flexible andstructured means for annotating information. One of the biggestpotential benefits of XML is its ability to improve the accuracy ofsearches through the millions of documents now stored on intranets andthe Internet. Exploitation of meta-information provided by XML tagginghas the potential to dramatically reduce the number of irrelevant hitsreturned compared with current HTML-based search engines. However,whereas all tags within the HTML markup language are standardised, XMLtags are, but for a small core of standard tags, entirelyuser-definable. To some extent, the usefulness of XML tagging istherefore subject to the skills of a document author. However, XML doesallow user communities, from industry groups to single users, to developan individual mark-up language that best suits their needs. In order tocoordinate proposals for XML standards, in e-commerce applications forexample, the Organisation for the Advancement of Structured InformationStandards (OASIS) has created the Web Portal “XML.org”.

[0006] A known XML search engine such as “GoXML” provides a largelyconventional keyword-based search facility to locate relevantinformation in conventional web pages as well as XML tagged documents.Where XML documents are located in a search, GoXML compiles and presentsa flat list of the tags that mark up document parts within which searchkeywords were found, together with a conventional list of references tothose documents. The user can then explore this list of “hit” tags byselecting a particular tag, causing the document list to be reduced toonly those documents where a search keyword was found to occur in a partmarked up by the selected tag. However, GoXML does not carry out furtheranalysis of “hit” tags to enable a user to fully exploit the potentialcontextual information provided by those tags and to navigate the searchresults more effectively.

[0007] According to a first aspect of the present invention there isprovided a method of accessing sets of information stored in aninformation system, wherein portions of said sets of information areenclosed by tags of a hierarchical tag structure defined according to astructured mark-up language, the method comprising the steps of:

[0008] (i) generating a search query comprising specified searchcriteria;

[0009] (ii) identifying portions of said sets of information matchingsaid specified search criteria, and outputting a list of references tosaid identified sets of information;

[0010] (iii) identifying, for each matching portion identified at step(ii), an enclosing tag structure and outputting a list of saididentified tag structures;

[0011] (iv) receiving a selection signal specifying a tag structure fromthe list output at step (iii);

[0012] (v) adjusting said list of references from step (ii) to comprisereferences only to said identified sets of information that contain thetag structure selected at step (iv);

[0013] (vi) adjusting said list of tag structures to comprise tagstructures contained in information sets referenced in said adjustedlist at step (v); and

[0014] (vii) repeating step (iv) in respect of said adjusted list of tagstructures, and step (v) to identify a more specific list of referencesto sets of information.

[0015] According to preferred embodiments of the present invention,apparatus and methods are provided to enable a user to locate andretrieve sets of information relevant to search criteria specified in asearch query submitted by the user. In particular, as for allembodiments of the present invention, apparatus and methods are designedto enable the user to exploit contextual information provided withindocuments that have been annotated using tags defined according to astructured markup language such as XML. Besides locating portions of adocument that appear to match the user's search criteria, embodiments ofthe present invention enable the user to use XML or other markuplanguage tags, inserted into a document by the author, to help identifythose documents from a potentially large set of search results that aremost relevant to the original search query or, more particularly, towhat the user hoped to find.

[0016] Embodiments of the present invention are largely concerned withanalysis of search results, enabling a user to exploit contextualinformation provided by markup language annotations in documentsidentified in the search. Largely conventional search engines and searchtechniques may be used to obtain a set of search results on the basis ofa user's search query. However, the otherwise conventional search engineor other information retrieval tool must be arranged to not only tolocate portions of documents matching a user's search query, but also toidentify and return annotating tags associated with those matchingportions, according to the particular markup language used. Inparticular, the structure of annotating tags used in a particularstructured markup language must be identified and returned in the searchresults, preserving that tag structure for analysis by novel andinventive features of the present invention, to be described in detailbelow.

[0017] Preferably, the method of said first aspect includes the stepsof:

[0018] (viii) detecting, following receipt of the selection signal atstep (iv), a request for access to a corresponding set of informationlisted in step (v);

[0019] (ix) updating, in respect of the tag structure selected at step(iv), a weighting value representative of the probability that selectionof the tag structure led to a request for access to a corresponding setof information; and

[0020] (x) outputting an ordered list of the tag structures identifiedat step (iii) according to their respective weighting values.

[0021] In this preferred embodiment, the method provides a furtherenhancement to the tag analysis process by monitoring, over a period oftime, the selection of tags by users from each presented tag list andmonitoring any subsequent access by a user of particular documentslisted in the resultant reduced document lists. The apparatus records ahistory of tag selection by users in general, or by a particular user orgroup of users, and their subsequent document retrieval activity inrespect of each distinct tag and/or tag structure. This historical datais then used to weight each of the distinct tags and tag structuresaccording to likelihood that they resulted in a selection of documentsrelevant to those users. The apparatus is then able to present a giventag list in a ranking order of decreasing usefulness for example, whenparticular tags known from the historical records appear in a set ofsearch results.

[0022] There now follows, by way of example only, a detailed descriptionof specific embodiments of the present invention. This description is tobe read in conjunction with the accompanying drawings, of which:

[0023]FIG. 1 is a diagram showing features of an information searchingapparatus according to a preferred embodiment of the present invention;

[0024]FIG. 2 is a flow chart showing steps in operation of aninformation searching apparatus according to a first embodiment of thepresent invention;

[0025]FIG. 3 is a flow diagram showing steps in operation of a contextanalysis module according to a first embodiment of the presentinvention;

OVERVIEW OF PREFERRED EMBODIMENTS

[0026] Before describing a number of preferred embodiments of thepresent invention in detail, these embodiments will first be describedin overview.

[0027] According to preferred embodiments of the present invention,apparatus and methods are provided to enable a user to locate andretrieve sets of information relevant to search criteria specified in asearch query submitted by the user. In particular, as for allembodiments of the present invention, apparatus and methods are designedto enable the user to exploit contextual information provided withindocuments that have been annotated using a structured markup languagesuch as XML. Besides locating portions of a document that appear tomatch the user's search criteria, embodiments of the present inventionenable the user to use XML or other markup language tags, inserted intoa document by the author, to help identify those documents from apotentially large set of search results that are most relevant to theoriginal search query or, more particularly, to what the user hoped tofind.

[0028] Embodiments of the present invention are largely concerned withanalysis of search results, enabling a user to exploit contextualinformation provided by markup language annotations in documentsidentified in the search. Largely conventional search engines and searchtechniques may be used to obtain a set of search results on the basis ofa user's search query. However, the otherwise conventional search engineor other information retrieval tool must be arranged to not only tolocate portions of documents matching a user's search query, but also toidentify and return annotating tags associated with those matchingportions, according to the particular markup language used. Inparticular, the structure of annotating tags used in a particularstructured markup language must be identified and returned in the searchresults, preserving that tag structure for analysis by novel andinventive features of the present invention, to be described in detailbelow.

[0029] Search results comprise a list of references to documents foundby the search engine to have portions matching the search query, forexample a list of document URLs if those documents are stored on webservers and accessible over the Internet, together with the respectivetag structures associated with each of the matching portions. In eachembodiment, the search results are presented to the user as a list ofthe identified tags and tag structures together with a list of theidentified document references. For example, in a hierarchical tagstructure such as that used with XML, the full structure of tagssurrounding a matching portion of text will be presented in the taglist, with, optionally, a list of the lowest level tags.

[0030] In a first preferred embodiment of the present invention, a useris provided with apparatus having a user interface and facilities toenable the user to navigate through the returned set of search resultsmaking use of information provided by returned tags. In particular, auser may select one or more particular tags or tag structures from thetag list presented at the user interface and, in response to thatselection, the apparatus will present at the user interface, from theset of document references, a list of only those documents containingthe selected tags or tag structures associated with matching text. Auser may have selected a particular tag because the words used in thosetags were suggestive of a context relevant to the type of informationthe user was seeking. Now that the document list has been reduced theapparatus then adjusts the displayed tag list to include not only theselected tag or tags, but also any other tags and tag structuresassociated with matching text from the documents in the reduced list.

[0031] Identification of such additional tags may be highly relevant tothe user because they may be suggestive of other contexts that mightreveal relevant information, especially as those tags occurred in thesame documents as the original tag selection. This so called “doublefiltering” technique may be extended by the user by making a furtherselection from the adjusted tag list and further restricting orotherwise altering the list of documents being investigated.

[0032] In a second embodiment, the apparatus provides an enhancement tothe tag analysis process by using a thesaurus to identify different tagswithin the list that may have a similar meaning, or by using clusteringtechniques to identify tags that may relate to similar contexts. Suchrelated tags may then be grouped together in the tag list presented atthe user interface to enable a user to see that such tags may be sorelated and to providing the opportunity for the user to select thegroup of related tags rather than individual tags or tag structures inthe “double filtering” navigation process outlined above.

[0033] In a third embodiment, the apparatus provides a furtherenhancement to the tag analysis process by monitoring, over a period oftime, the selection of tags by users from each presented tag list andmonitoring any subsequent access by a user of particular documentslisted in the resultant reduced document lists. The apparatus records ahistory of tag selection by users in general, or by a particular user orgroup of users, and their subsequent document retrieval activity inrespect of each distinct tag and/or tag structure. This historical datais then used to weight each of the distinct tags and tag structuresaccording to likelihood that they resulted in a selection of documentsrelevant to those users. The apparatus is then able to present a giventag list in a ranking order of decreasing usefulness for example, whenparticular tags known from the historical records appear in a set ofsearch results.

[0034] In a fourth embodiment, the apparatus is arranged, on the basisof previous document access by users, to establish a profile of thetypical information content of portions of documents associated witheach distinct tag and tag structure. Known document summarisers and keyterm extractors may be used to extract such profile information eachtime a document is accessed by a user. Typical information content of agiven tag may then be made available to users as required. This helps toovercome problems in exploiting tags when lack of standardisation hasresulted in different document authors using different tags or obscurechoices of tag to represent the same or a similar context in particularfields of information.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0035] The now follows a more detailed description of preferredembodiments outlined above.

[0036] Referring to FIG. 1, an information retrieval apparatus 100 isshown according to preferred embodiments of the present invention, foruse in searching for relevant information stored in file servers 105,web servers for example, and accessible over a communications network110 such as the Internet. The information searching apparatus isarranged to receive search queries supplied by users from terminalequipment 115, typically submitted using a conventional browser productinstalled on a user's terminal equipment 115, a web browser for example,and transmitted over the communications network 105 by means of a router120. The information searching apparatus 100 includes a user interface125 for receiving search queries from users (115) and for returningsearch results to their terminal equipment 115, a search engine 130 anda context analysis module 135. Context analysis module 135 is arrangedin particular to analyse and to present, via the user interface 125, XMLtag information enclosing portions of documents that were found by thesearch engine 130 to match the search query, in a way that enables usersto exploit the contextual information provided by those tags.

[0037] Steps in operation of an information searching apparatus 100according to a first embodiment of the present invention will now bedescribed with reference to FIG. 2.

[0038] Referring to FIG. 2, processing begins at STEP 200 with receiptof a search query via the user interface 125. The search query specifiessearch criteria, such as a set of keywords or phrases, to be used inidentifying potentially relevant sets of information. At STEP 205, thereceived search criteria are passed to the search engine 130 and thesearch engine 130 is activated to begin searching for relevant documentsstored in file servers 105. Search engine 130 may be any one of a numberof different types of known search engine arranged to use the suppliedsearch criteria in any appropriate way to identify relevant information.

[0039] If a potentially relevant document is located by search engine130, at STEP 210, then at STEP 215 a reference to the located document,for example a URL if the document is a web page located on a web server,is added to search results being compiled by the search engine 130. If,at STEP 220, the located document is an XML document, then at STEP 225the located document is analysed to identify a full hierarchy of XMLtags enclosing a portion of the located document containing relevantinformation. Preferably, search engine 130 may be adapted to carry outbasic XML tag identification once it has established that the locateddocument is an XML document. Alternatively, the context analysis module135 may identify XML tags by direct access to a document identified bythe search engine 130. Any identified XML tags are added to the searchresults at STEP 230, preserving the tag hierarchy. Processing then movesto STEP 235 to determine whether all accessible documents have beensearched.

[0040] If, at STEP 220, the located document was not an XML document, orif at STEP 210 no relevant document was found, then processing proceedsdirectly to STEP 235 to determine whether all accessible documents havebeen searched.

[0041] At STEP 235, if all documents accessible to the search engine 130have been searched, then at STEP 240 the compiled search results arepassed to the context analysis module 135 for analysis and presentationto the initiating user via user interface 125. If documents remain to besearched at STEP 235, then processing returns to STEP 205 to continuethe search for relevant information.

[0042] The context analysis module 135 may be arranged to provide anumber of particularly useful functions, exploiting any contextualinformation provided by XML tags, to assist a user in navigating andselecting from a set of search results. Such functions are of particularuse when search results contain a great many “hits” in response to aparticular search query. Further embodiments of the present invention,to be described below, relate to the different levels of functionalitythat may be provided by the context analysis module 135.

[0043] According to the first embodiment of the present invention,context analysis module 135 provides a basic tag listing and groupingfacility, accessible to users via the user interface 125, preserving anddisplaying a hierarchy of tags where more than one level of tagging wasdetected in a particular document. This enables search results to begrouped and selected by users for further examination according to taggroup, the assumption being that tags of the same or a similar name areindicative of a similar information context. This basic tag listing andgrouping function of context analysis module 135 will now be describedwith reference to FIG. 3.

[0044] Referring to FIG. 3, context analysis begins at STEP 300 withreceipt from the search engine 130 of a set of search results. At STEP305, all the XML tags identified in the search results are selected andan ordered list of XML tags is generated, preserving the hierarchicalstructure of tags where there is more than one level enclosing arelevant section of a document. At STEP 310, for each distinctly namedtag and tag hierarchy, a count is made of the number of documentreferences from the search results in which the same tag or taghierarchy was identified. At STEP 315, the ordered tag list andassociated document count is presented to the originating user via theuser interface 125. At STEP 320, in addition to the tag list, a list ofall the identified document references is also presented via the userinterface 125 in a conventional format, for example including a documentaddress or other reference together with, if enabled, a precis of therelevant section of each document.

[0045] At STEP 325, the context analysis module 135 is arranged toaccept, via the user interface 125, user selection of any tag or groupof tags from the displayed tag list, or selection of an option to exit.If the user does not want to exit, at STEP 330, then at STEP 335 thelist of document references is adjusted to show references for onlythose documents in which one or more of the selected tags were found bythe search engine 130. So, for example, if there were found at STEP 310to be 17 documents in which a relevant portion was contained within anXML tag <PRODUCT_TYPE>, then if <PRODUCT_TYPE> were selected from thetag list at STEP 325, the user would then see at STEP 335 those 17document references listed via the user interface.

[0046] Having presented the adjusted list of document references at STEP335, the tag list itself is then adjusted at STEP 340 to display onlythose tags identified in those documents referenced in the adjusteddocument list. This adjustment to the tag list may bring in extra tagsthat were not selected at STEP 325 because in one or more of thedocuments containing the selected tag(s), the search engine 130 may haveidentified more than one potentially relevant portion, each portionbeing enclosed by different tags. This additional tag information can bevery useful when navigating through the search results because theadjusted tag list is more likely to contain tags related in context (atleast from the point of view of the user submitting the original searchquery) given that they occurred within documents located using the samesearch query. When processing returns to STEP 325 the user may selectone or more of those additional tags and hence, at STEP 335, identifyand view any further documents containing potentially relevant portionsin the context of those additional tags.

[0047] In this way, a user may use the tag list to “drill-down” to thosedocuments most likely to contain relevant information by navigatingthrough tags that appear to suggest the most relevant context.Adjustment of the listed document references and, in response, to thelisted tags to correspond to the listed documents, provides a doublefiltering mechanism that is particularly effective in helping a user tonavigate through search results and select a potentially relevant subsetof documents for further investigation, making full use of contextualinformation provided by XML tags.

[0048] At any stage, a user may expand analysis of the search results byrestoring the full list of displayed tags and selecting another startingpoint.

[0049] Further basic sorting facilities may be provided by the contextanalysis module 135 according to the first embodiment. In particular, aso called “stop list” may be used by the context analysis module 135 toeliminate particularly basic XML tags from consideration and display ina tag list. Such tags might include <CHAPTER>, <SECTION>, <PARAGRAPH>,<WORDS> and other such tags that provide only structural informationabout the layout of a document and little about the informationalcontext of a portion identified by the search engine 130.

[0050] However, tags such as <SUMMARY> or <PRECIS> provide usefulinformation about the context, within the document, of a matching wordor phrase, suggesting that the matching word or phrase is more likely tobe indicative of the primary information content of the document as awhole. Whether stop lists are used in relation to a particularinformation search is preferably an option selectable by a user via theuser interface 125.

[0051] According to a second preferred embodiment of the presentinvention, there is provided and an apparatus and method for enhancingthe analysis and interpretation of tags and tags structures returned insearch results to assist a user in recognising groups of tags having asimilar meaning or relating to a similar context.

[0052] XML tags in particular are simply words. Aside from those thatare standardised for XML itself, different words may be used indifferent XML implementations to mean largely the same thing. One authormight tag part of a document as <summary> while another might tag thesame part of another document as <precis>; or a section of one documentmight be about software agents and tagged <agents> while in anotherdocument the same tag is used to tag a section about estate agents.

[0053] According to the second preferred embodiment of the presentinvention, the context analysis module 135 is provided with access to athesaurus for use in identifying synonyms and helping to disambiguatetags. A general purpose thesaurus may be used, for example one such asWordNet, as disclosed in “WordNet: An Electronic Lexical Database”,edited by Christiane Fellbaum, MIT Press, May 1998., or, for morespecialised information searches, a ready-made domain-specific thesaurusmay be accessed, or even created using a clustering technique—see below.

[0054] Preferably, in presenting search results using tags lists asdescribed above with reference to FIG. 3, the context analysis module135 may present tags in a list along with identified synonyms from thethesaurus to help clarify the context of the tag. Alternatively, tagsfound to be related in meaning, following reference to the thesaurus,may be grouped together in the presented tag list to enable a user toselect the whole group when narrowing down the list of documents to beinvestigated.

[0055] In addition, or alternatively to the use of a thesaurus,clustering techniques such as those disclosed in “ClusteringAlgorithms”, Rasmussen, E., in “Information Retrieval: Data Structuresand Algorithms”, edited by Frakes, W. & Baeza-Yates, R.,Prentice-Hall,New Jersey, USA, 1992, may be used to identify tags having a similarmeaning or used in a similar context in the returned search results.

[0056] A numerical value representative of a measure of the contextual‘similarity’ of a pair of tags Ti and Tj returned in the search results,may be calculated as:

2*[Ti∩Tj]/[Ti]+[Tj]

[0057] where [Ti] and [Tj] are the number of documents in the searchresults in which tags Ti and Tj respectively were identified in relationto relevant information, and

[0058] [Ti∩Tj] is the number of documents in which Ti and Tj co-occur.This measure of similarity takes a value between 0 and 1, with 0 meaningthat the tags share no similarity of context (no documents contain boththe tags) and 1 meaning that all documents in the search results containboth the tags and hence that the two tags are likely to have been usedin the same information context.

[0059] A matrix of values for the above measure of context similarity iscalculated for the tags and tag structures returned in a given set ofsearch results. This matrix may then be used to identify groups of tagsthat may be related in context, for example by identifying a set of tagsfor which each combination of two tags selected from the set has a valueof the similarity measure exceeding a predetermined threshold. The mostsimilar tags may then be presented in groups for selection by a user inthe tag list.

[0060] According to a third preferred embodiment of the presentinvention there is provided an apparatus and method for monitoring tagselection and associated document access by individual users or bypredetermined groups of users as the basis for weighting and rankingdistinct tags. Weightings may represent the probability that a given tagor tag structure will result in a selection of documents from the searchresults that contains documents of relevance to the particular user orgroup of users.

[0061] The apparatus of the third embodiment is provided with aninformation access monitor for monitoring selection of tags and accessto referenced documents by users. The information retrieval monitor isarranged with access to the user interface 125 to monitor all tagselections by users and any requests by users to access documentsincluded in corresponding lists. The monitor also includes a store forrecording history of selection for each distinct tag and tag structureand for recording weightings calculated in respect of each tag.

[0062] Each time a user selects a tag from a tag list presented at theuser interface 125, the monitor checks for an entry in the store forthat particular tag. If not, then an entry is created for the tag. Ifnecessary, certain “low value” words may be removed from the stored tag,or words may be stemmed to render them into a more standardised form.For each tag, a counter is maintained both for the number of times thatselection of the tag was selected and for the number of times thatselection of the tag was followed by an access request by the user for adocument listed in the resultant reduced document list (see STEPs 325and 335 of FIG. 3). These counters may then be used to calculate, foreach tag, a weighting representing a measure of the probability thatselection of the tag results in a list containing relevant documents forthat user.

[0063] The monitor may be further enhanced to monitor the duration of adocument access by a user, providing further information on therelevance of the accesses document to the user. Longer duration accessto documents may trigger a double increment, for example, of the secondof the two counters mentioned above.

[0064] Operation of the information retrieval monitor described abovemay be triggered each time a new set of results is returned in responseto a search query and the initial tag list is presented at the userinterface 125. Weightings may be recalculated each time a user accessesa document so that they are immediately available for use in rankingeach presented tag list.

[0065] In an alternative ranking method, a user profile of keywords orterms may be stored in respect of each user of the apparatus. Such aprofile may be used to represent the interests of a user andparticularly contextual information of relevance to that user'sinterests. A known measure of relevance may be calculated for each tagin a tag list with respect to the words and terms in the user profile.The measure of relevance may be used to rank the tags in the list inorder of relevance to the user profile as a further assistance to a userin selecting tags most likely to result in an efficient navigation of aset of search results leading to a list of the most relevant documentsfrom the search.

[0066] According to a fourth preferred embodiment, known documentsummarisers and key term extractors may be used to accumulate a profileof information content typically associated with each of a set ofdistinct tags, for example the tags stored by the information retrievalmonitor of the third embodiment described above.

[0067] Each time a user accesses a particular document, key termsindicative of the information content of the matching portion of thatdocument may be extracted and stored in association with the particulartag selection that preceded access of that document. Such terms may befurther summarised to build up a profile of a tag for presentation tousers as required. This feature provides further assistance to users inunderstanding the intended meaning of a tag, particularly in the absenceof standardised use of tags.

1. A method of accessing sets of information stored in an informationsystem, wherein portions of said sets of information are enclosed bytags of a hierarchical tag structure defined according to a structuredmark-up language, the method comprising the steps of: (i) generating asearch query comprising specified search criteria; (ii) identifyingportions of said sets of information matching said specified searchcriteria, and outputting a list of references to said identified sets ofinformation; (iii) identifying, for each matching portion identified atstep (ii), an enclosing tag structure and outputting a list of saididentified tag structures; (iv) receiving a selection signal specifyinga tag structure from the list output at step (iii); (v) adjusting saidlist of references from step (ii) to comprise references only to saididentified sets of information that contain the tag structure selectedat step (iv); (vi) adjusting said list of tag structures to comprise tagstructures contained in information sets referenced in said adjustedlist at step (v); and (vii) repeating step (iv) in respect of saidadjusted list of tag structures, and step (v) to identify a morespecific list of references to sets of information.
 2. A methodaccording to claim 1, wherein step (iii) further includes identifying,using a thesaurus, groups of two or more of said tag structurescontaining tags having a similar meaning, and wherein at step (iv),selecting a tag structure includes selecting one of said groups of tagstructures.
 3. A method according to claim 1, including the steps of:(viii) detecting, following receipt of the selection signal at step(iv), a request for access to a corresponding set of information listedin step (v); (ix) updating, in respect of the tag structure selected atstep (iv), a weighting value representative of the probability thatselection of the tag structure led to a request for access to acorresponding set of information; and (x) outputting an ordered list ofthe tag structures identified at step (iii) according to theirrespective weighting values.